AITopics | visual policy

Collaborating Authors

visual policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating Visual-Policy Learning through Parallel Differentiable Simulation

Neural Information Processing SystemsJun-17-2026, 01:57:45 GMT

In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Austria (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Energy (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Prediction with Action: Visual Policy Learning via Joint Denoising Process

Neural Information Processing SystemsMar-22-2026, 12:44:10 GMT

Diffusion models have demonstrated remarkable capabilities in image generation tasks, including image editing and video creation, representing a good understanding of the physical world. On the other line, diffusion models have also shown promise in robotic control tasks by denoising actions, known as diffusion policy. Although the diffusion generative model and diffusion policy exhibit distinct capabilities--image prediction and robotic action, respectively--they technically follow similar denoising process. In robotic tasks, the ability to predict future images and generate actions is highly correlated since they share the same underlying dynamics of the physical world. Building on this insight, we introduce \textbf{PAD}, a novel visual policy learning framework that unifies image \textbf{P}rediction and robot \textbf{A}ction within a joint \textbf{D}enoising process. Specifically, PAD utilizes Diffusion Transformers (DiT) to seamlessly integrate images and robot states, enabling the simultaneous prediction of future images and robot actions.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Accelerating Visual-Policy Learning through Parallel Differentiable Simulation

You, Haoxiang, Liu, Yilang, Abraham, Ian

arXiv.org Artificial IntelligenceNov-12-2025

In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. Our approach decouple the rendering process from the computation graph, enabling seamless integration with existing differentiable simulation ecosystems without the need for specialized differentiable rendering software. This decoupling not only reduces computational and memory overhead but also effectively attenuates the policy gradient norm, leading to more stable and smoother optimization. We evaluate our method on standard visual control benchmarks using modern GPU-accelerated simulation. Experiments show that our approach significantly reduces wall-clock training time and consistently outperforms all baseline methods in terms of final returns. Notably, on complex tasks such as humanoid locomotion, our method achieves a $4\times$ improvement in final return, and successfully learns a humanoid running policy within 4 hours on a single GPU.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2505.10646

Country: Europe > Austria (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Performance-Guided Refinement for Visual Aerial Navigation using Editable Gaussian Splatting in FalconGym 2.0

Miao, Yan, Yuceel, Ege, Fainekos, Georgios, Hoxha, Bardh, Okamoto, Hideki, Mitra, Sayan

arXiv.org Artificial IntelligenceOct-3-2025

Next, we further improve the architecture of [3] by removing IMU inputs and instead feeding a short history of past controls to the controller (blue box in Figure 2), which provides implicit temporal context. Next, the controller training follows a similar imitation learning procedure as in [3]: we first implement a state-based expert that flies through different tracks in simulation; at each timestep, we render the onboard RGB image and record the state-based controller's expert action. The RGB image is passed through the trained U-Net to obtain a binary mask, and we form supervised pairs where the masked image coupled with the past control actions are used to predict the current action to train the controller. Thanks to the Edit API, now we can synthesize essentially arbitrarily many tracks in FalconGym 2.0 to train both perception and controller without additional per-track real-world effort required by [1], [3], [5]. To sample efficiently, our unique design choice is to train on two-gate tracks. Intuitively, the initial state together with two successive gates spans the local geometric variability of longer courses; a controller that performs well on such segments could generalize well to multi-gate tracks by invariance and composition, as is empirically confirmed in Section IV. C. Performance-Guided Refinement Training A straightforward method to collect training data for the visual policy would be to uniformly sample the two-gate track space that is dynamically feasible and observable (as defined at the start of this section). However, uniform sampling can be sample-inefficient in a large high-dimensional workspace. With our Edit API, we can steer training data col- lection toward the visual policy's weak spots and iteratively refine to improve the visual policy.

artificial intelligence, falcongym 2, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.02248

Country:

Europe > Netherlands (0.28)
North America (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Visual Parkour from Generated Images

Yu, Alan, Yang, Ge, Choi, Ran, Ravan, Yajvan, Leonard, John, Isola, Phillip

arXiv.org Artificial IntelligenceOct-31-2024

Fast and accurate physics simulation is an essential component of robot learning, where robots can explore failure scenarios that are difficult to produce in the real world and learn from unlimited on-policy data. Yet, it remains challenging to incorporate RGB-color perception into the sim-to-real pipeline that matches the real world in its richness and realism. In this work, we train a robot dog in simulation for visual parkour. We propose a way to use generative models to synthesize diverse and physically accurate image sequences of the scene from the robot's ego-centric perspective. We present demonstrations of zero-shot transfer to the RGB-only observations of the real world on a robot equipped with a low-cost, off-the-shelf color camera. website visit https://lucidsim.github.io

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.00083

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Singapore (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Leisure & Entertainment (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

SoloParkour: Constrained Reinforcement Learning for Visual Locomotion from Privileged Experience

Chane-Sane, Elliot, Amigo, Joseph, Flayols, Thomas, Righetti, Ludovic, Mansard, Nicolas

arXiv.org Artificial IntelligenceSep-20-2024

Parkour poses a significant challenge for legged robots, requiring navigation through complex environments with agility and precision based on limited sensory inputs. In this work, we introduce a novel method for training end-to-end visual policies, from depth pixels to robot control commands, to achieve agile and safe quadruped locomotion. We formulate robot parkour as a constrained reinforcement learning (RL) problem designed to maximize the emergence of agile skills within the robot's physical limits while ensuring safety. We first train a policy without vision using privileged information about the robot's surroundings. We then generate experience from this privileged policy to warm-start a sample efficient off-policy RL algorithm from depth images. This allows the robot to adapt behaviors from this privileged experience to visual locomotion while circumventing the high computational costs of RL directly from pixels. We demonstrate the effectiveness of our method on a real Solo-12 robot, showcasing its capability to perform a variety of parkour skills such as walking, climbing, leaping, and crawling.

constraint, locomotion, robot, (14 more...)

arXiv.org Artificial Intelligence

2409.13678

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > New York (0.04)
Europe > Portugal > Braga > Braga (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Domain Adaptation of Visual Policies with a Single Demonstration

Wang, Weiyao, Hager, Gregory D.

arXiv.org Artificial IntelligenceJul-23-2024

Weiyao Wang 1 and Gregory D. Hager 1 Abstract -- Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for vi-suomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture's capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose. Videos and more information can be viewed at project webpage: https://sites.google.com/view/promptadapt.

adaptation, arxiv preprint arxiv, demonstration, (14 more...)

arXiv.org Artificial Intelligence

2407.1682

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

Chen, Zerui, Chen, Shizhe, Schmid, Cordelia, Laptev, Ivan

arXiv.org Artificial IntelligenceApr-24-2024

In this work, we aim to learn a unified vision-based policy for a multi-fingered robot hand to manipulate different objects in diverse poses. Though prior work has demonstrated that human videos can benefit policy learning, performance improvement has been limited by physically implausible trajectories extracted from videos. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. A coordinate transformation method is proposed to significantly boost the performance. We evaluate our method on three dexterous manipulation tasks and demonstrate a large improvement over state-of-the-art algorithms.

human video, trajectory, video, (16 more...)

arXiv.org Artificial Intelligence

2404.15709

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

Mu, Tongzhou, Guo, Yijie, Xu, Jie, Goyal, Ankit, Su, Hao, Fox, Dieter, Garg, Animesh

arXiv.org Artificial IntelligenceApr-10-2024

Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents. We introduce AdaDemo (Adaptive Online Demonstration Expansion), a general framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset. AdaDemo strategically collects new demonstrations to address the identified weakness in the existing policy, ensuring data efficiency is maximized. Through a comprehensive evaluation on a total of 22 tasks across two robotic manipulation benchmarks (RLBench and Adroit), we demonstrate AdaDemo's capability to progressively improve policy performance by guiding the generation of high-quality demonstration datasets in a data-efficient manner.

dataset, demonstration, demonstration dataset, (15 more...)

arXiv.org Artificial Intelligence

2404.07428

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Acar, Cihan, Binici, Kuluhan, Tekirdağ, Alp, Wu, Yan

arXiv.org Artificial IntelligenceDec-2-2023

The use of multi-camera views simultaneously has been shown to improve the generalization capabilities and performance of visual policies. However, the hardware cost and design constraints in real-world scenarios can potentially make it challenging to use multiple cameras. In this study, we present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our proposed method involves utilizing a technique known as knowledge distillation, in which a pre-trained ``teacher'' policy trained with multiple camera viewpoints guides a ``student'' policy in learning from a single camera viewpoint. To enhance the student policy's robustness against camera location perturbations, it is trained using data augmentation and extreme viewpoint changes. As a result, the student policy learns robust visual features that allow it to locate the object of interest accurately and consistently, regardless of the camera viewpoint. The efficacy and efficiency of the proposed method were evaluated both in simulation and real-world environments. The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone. Furthermore, the student policy demonstrates zero-shot transfer capability, where it can successfully grasp and lift objects in real-world scenarios for unseen visual configurations.

camera view, student policy, teacher policy, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2023.3336245

2303.07026

Country:

Europe > Middle East > Republic of Türkiye > Tekirdag Province > Tekirdag (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback